首页> 外文OA文献 >The misuse of the NASA metrics data program data sets for automated software defect prediction
【2h】

The misuse of the NASA metrics data program data sets for automated software defect prediction

机译:滥用Nasa指标数据程序数据集用于自动化软件缺陷预测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: The NASA Metrics Data Program data sets have been heavily used in software defect prediction experiments. Aim: To demonstrate and explain why these data sets require significant pre-processing in order to be suitable for defect prediction. Method: A meticulously documented data cleansing process involving all 13 of the original NASA data sets. Results: Post our novel data cleansing process; each of the data sets had between 6 to 90 percent less of their original number of recorded values. Conclusions: One: Researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Two: Defect prediction data sets could benefit from lower level code metrics in addition to those more commonly used, as these will help to distinguish modules, reducing the likelihood of repeated data points. Three: The bulk of defect prediction experiments based on the NASA Metrics Data Program data sets may have led to erroneous findings. This is mainly due to repeated data points potentially causing substantial amounts of training and testing data to be identical.
机译:背景:NASA度量数据程序数据集已在软件缺陷预测实验中大量使用。目的:演示和解释为什么这些数据集需要大量预处理才能适用于缺陷预测。方法:精心记录的数据清理过程,涉及所有13个原始NASA数据集。结果:发布我们新颖的数据清理过程;每个数据集的原始记录值数量减少了6%至90%。结论:一:研究人员需要在如何使用数据的基础上分析构成研究结果基础的数据。二:缺陷预测数据集除了更常用的度量标准外,还可以从较低级别的代码度量标准中受益,因为它们将有助于区分模块,减少重复数据点的可能性。第三:基于NASA度量数据程序数据集的大量缺陷预测实验可能导致错误的发现。这主要是由于重复的数据点可能导致大量的训练和测试数据相同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号